Periodic register saturation in innermost loops
نویسندگان
چکیده
This article treats register constraints in high performance embedded VLIW computing, aiming to decouple register constraints from instruction scheduling. It extends the register saturation (RS) concept to periodic instruction schedules, i.e., software pipelining (SWP). We formally study an approach which consists in computing the exact upper-bound of the register need for all the valid SWP schedules of an innermost loop (without overestimation), independently of the functional unit constraints. We call this upper-limit the periodic register saturation (PRS) of the data dependence graph (DDG). PRS is well adapted to situations where spilling is not a favourite solution for reducing register pressure compared to ILP scheduling: spill operations request memory data with a higher energy consumption. Also, spill code introduces unpredictable cache effects and makes Worst-Case-ExecutionTime (WCET) estimation less accurate. PRS is a computer science concept that has many possible applications. First, it provides compiler designers new ways to generate better codes regarding register optimisation by avoiding useless spilling. Second, its computation can help architectural designers to decide about the most suitable number of available registers inside an embedded VLIW processor; such architectural decision can be done with full respect to instruction level parallelism extraction, independently of the chosen functional units configuration. Third, it can be used to verify prior to instruction scheduling that a code does not require spilling. In this paper, we write an appropriate mathematical formalism for the problem of PRS computation and reduction in the case of loops where data dependence graphs may be cyclic. We prove that PRS reduction is NP-hard and we provide optimal and approximate methods. We have implemented our methods, and our experimental results demonstrate the practical usefulness and efficiency of the PRS approach.
منابع مشابه
Deciding Innermost Loops
We present the first method to disprove innermost termination of term rewrite systems automatically. To this end, we first develop a suitable notion of an innermost loop. Second, we show how to detect innermost loops: One can start with any technique amenable to find loops. Then our novel procedure can be applied to decide whether a given loop is an innermost loop. We implemented and successful...
متن کاملInstruction Fetch Energy Reduction Using Forward-Branch Bufferable Innermost Loop Buffer
Recently, several loop buffer designs have been proposed to reduce instruction fetch energy due to size and location advantage of loop buffer. Nevertheless, on design complexity dictates most loop buffer designs to store only innermost loops without forward branch or instructions within innermost loops before a forward branch. While program modeling shows that typical programs can best be repre...
متن کاملModulo Scheduling with Reduced Register Pressure
Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Modulo scheduling refers to a class of algorithms for software pipelining. Most previous research on modulo scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) ...
متن کاملSoftware pipelining of nested loops for real-time DSP applications
Modem DSP Processors have been integrated with InsrrucrionLevel Purullelism(ILP), which presents a challenge to exploit ILP within DSP applications. Software Pipelining is an efficient tcchnique used to expose ILP for loop programs and has been widely used for current microprocessors. It has been recently used in DSP compilers, but only for the innermost loops. This paper proposes a new approac...
متن کاملA Tiling Perspective for Register Optimization
Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 35 شماره
صفحات -
تاریخ انتشار 2009